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Abstract — In the past decade, Robotic-Assisted Surgery 
(RAS) has become a widely accepted technique as an 
alternative to traditional open surgery procedures. The 
best robotic assistant system should combine both human 
and robot capabilities under the human control. As a 
matter of fact robot should collaborate with surgeons in a 
natural and autonomous way, thus requiring less of the 
surgeons ’ attention. In this survey, we provide a 
comprehensive and structured review of the robotic- 
assisted surgery and autonomous camera movement for 
RAS operation. We also discuss several topics, including 
but not limited to task and gesture recognition, that are 
closely related to robotic-assisted surgery automation 
and illustrate several successful applications in various 
real-world application domains. We hope that this paper 
will provide a more thorough understanding of the recent 
advances in camera automation in RSA and offer some 
future research directions. 

Keywords — Robotic-assisted surgery, autonomous, 
camera movement, task and gesture recognition. 

I. INTRODUCTION 

The operating room is a main unit in a hospital where 
surgical operations are performed. It is a challenging 
work environment that requires intense cooperation and 
coordination between a wide range of people and 
departments [1], Surgery is continuously subject to 
technological and medical innovations, illustrated by the 
accelerated development and introduction of new imaging 
technologies, advanced surgical tools, navigation and 
patient monitoring systems [2], The purpose of these 
advances is to improve patient treatment while they 
transform complicity to daily routine [3]. The ultimate 
goal of RMIS is to program the surgical robot to perform 
certain difficult or complex surgery in an autonomous 
manner. However, there is no technical roadmap to a fully 
autonomous surgical system at the present time [4], [5]. 
Surgical procedures are commonly categorized by 
urgency, type of procedure, body system involved, special 
instrumentation and degree of invasiveness. At a low 
degree of invasiveness we have Minimally Invasive 
Surgery (MIS), which involves a small outer incision to 
insert miniaturized instruments and remote control 
manipulation of instruments with indirect observation of 
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the surgical field through a camera (e.g. an endoscope or 
laparoscope), and is carried out through the skin or 
through a body cavity or anatomical opening. In contrast, 
an open surgical procedure or laparotomy requires a large 
incision to access the area of interest. In MIS surgeries, 
instead of making incisions, or straight-line cuts on the 
body, small cuts are made through with the surgical 
instruments. This minimizes both the bleeding that the 
patient undergoes and the scarring that occurs afterwards. 
By use of MIS, a patient may require only a small 
bandage on the incision, rather than multiple stitches or 
staples to close a large incision. These usually results in 
less infection, a quicker recovery time and shorter 
hospital stays, or allow outpatient treatment [4]. 

In the age of technology and introducing robots which has 
their influence all over our life, surgical area is not an 
exception. Minimally invasive surgery can be done either 
manually or using a robotic system. Robotic surgery, 
computer-assisted surgery, and robotically-assisted 
surgery are terms for technological developments that use 
robotic systems to aid in surgical procedures. Minimally 
invasive robotic surgery provides additional advantages 
over conventional laparoscopic surgery for surgical 
operations, including an increase in dexterity [5]— [7] and 
precision [8]. 

Current RAS systems operate in a master-slave mode, 
relying exclusively on direct surgeon input [8]. For 
example, camera controlling in current RAS platforms is 
an additional task under direct control of the surgeon. In 
the current FDA-approved system, da Vinci surgical 
platform (Intuitive Surgical, Sunnyvale, CA, USA) [12], 
many interface parameters are set once and remain at the 
same level throughout the operation while different 
surgical tasks and motions may require different camera 
behaviors [13], In robotic assisted surgery, instead of 
directly moving the instruments, the surgeon uses one of 
two methods to control the instruments; either a direct 
tele-manipulator or through computer control [9]. A tele- 
manipulator is a remote manipulator that allows the 
surgeon to perform the normal movements associated 
with the surgery while the robotic arms carry out those 
movements using end-effectors and manipulators to 
perform the actual surgery on the patient. In computer 
controlled systems the surgeon uses a computer to control 
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the robotic arms and its end-effectors, though these 
systems can also still use tele-manipulator for their input. 
One advantage of using the computerized method is that 
the surgeon does not have to be physically present. 

One form of robot used is the remote control of robotic 
functions referred to as teleoperated robots. Teleoperated 
robots are controlled remotely by a human being and the 
remote control signals can be sent through a wire, a local 
wireless system, over the Internet or by satellite. 
Teleoperated robots are probably the most common type 
of medical robot today. These robots are typically 
controlled by a surgeon or doctor and allow him to 
perform various tasks and treatments that he would not 
normally be able to do. 

Some advanced systems, not only have internal cameras, 
but can utilize more scanning technologies like MRI to 
allow the surgeon to get a real-time view of exactly where 
in the body the instruments are. This allows the surgeon 
to have a high level of control over exactly where he/she 
is directing the instruments. Some examples of surgical 
robots include the Neuromate stereotactic robot 
(Renishaw Inc.) for assisting in neurological surgeries, the 
da Vinci (Intuitive Surgical Inc., CA, USA) and the Zeus 
robotic surgical system (computer Motion Inc., Goleta, 
CA, USA). As an example the da Vinci Surgical System 
introduced in 1999 is becoming a standard in the field of 
minimally invasive surgery. Some advantages of this 
system are: better visualization, improved control and 
reduction in surgeon fatigue [10]. Surgical robotic enables 
the surgeon to operate in a tele -operation mode with or 
without force feedback using a master/slave system 
configuration [11]. In this mode of operation, 
visualization is obtained from either an external camera or 
an endoscopic camera. 

Task analysis is the analysis of how a task is 
accomplished, including a detailed description of both 
manual and mental activities. Task analysis emerged from 
research in applied behavior analysis and still has 
considerable research in that area [12]. The importance of 
using a standard task analysis method is that it provides a 
reproducible framework for breaking down a process 
following a structured technique. This enables developing 
a shared understanding or framework for a task, and 
communicates analysis results in a reproducible and 
widely understood manner in the industrial engineering 
and ergonomics communities. 



Fig. 1: Illustration of robotic surgery platform 
From a task analysis, a vocabulary can be drawn to 
describe an entire process, ensuring that all involved 
personnel are employing the same vocabulary and 
interpretation of each task and subtask definitions. Task 
analysis provides a representation of the operations that 
are required to accomplish a goal. This is especially 
critical when a designer aims to change or enhance a 
procedure, product, or system. Without a thorough 
mapping of an objective and its subtasks, it can be 
difficult to anticipate the influences or effects that a 
change may have on a system [12], [13]. It is, however, 
quite clear that to develop any automatic control system, a 
more detailed comprehension of the surgical procedures is 
needed [14]. 

II. LITREATURE REVIEW 

As the camera positioning problem is highly multi- 
disciplinary, we decided to present the different related 
areas. We first briefly present literature addressing 
gesture recognition and segmentation, with a focus on 
minimally invasive surgery and surgeon gesture 
classification based on task analysis. We then focus on 
camera positioning and zooming level during 
laparoscopic surgery. 

2.1 Surgical task recognition 

Recognition of surgical procedure from different 
granularity level become one of the recent interest of 
researchers [14]. The most focus is on phase recognitions 
and different paper use different methods to recognize 
surgery phases [14]. Forestier et. al. [15] used dynamic 
time warping to classify surgical process. Lange et. al. 
[16] did phase recognition in an operating room using 
sensor technology. Workflow and activity modeling have 
been worked [17] in order to monitor surgical procedures. 
With all these systems, information gathered incorporates 
end-effector data to some extent. Whether information is 
gathered from magnetic motion tracking of a hand 
holding manual MIS instruments, or the end-effector 
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trajectories are encoded from the da Vinci Application 
Programming Interface, all data is collect from end- 
effector. Although there are several advantages using end- 
effector information but there are some limitation as well. 
The good point is that it helps reduce the effects of other 
factors including fatigue that can result in added hand 
tremors, effects of motion scales, etc. However, pervious 
work successfully demonstrated that a system could be 
used to identify surgical gestures with great accuracy; a 
lack of variability presented some question as to the 
robustness of the system. The limited number of gestures 
identified in their study does not accommodate for noise 
that may stem from a mistake or surgery deviations. If, 
for instance, a surgeon makes a poor stitch and must 
correct it by undoing it, the classification system would 
certainly misclassify the task for lack of correct options to 
choose from. On the other hand, the major weakness of 
the approaches discussed above is not relying on a 
structured decomposition of the task. To make a 
classification system more robust, Golenberg et al. [12] 
developed Hierarchical Task Analysis of a robotically 
assisted four-throw suturing task and they presents a 
classification system that automatically and accurately 
identifies 24 surgeon subtasks from library with accuracy 
of 94.56% which is based on rudimentary hand 
movements. The importance of using a structured task 
analysis method is that it enables us to have a 
reproducible framework which provides consistency and 
can be generalized to be applied in other platforms. Using 
a structured approach also makes data more acceptable 
and interpretable since the creation of a gesture 
breakdown would follow guidelines and rules. 
Additionally, a thorough task analysis could help ensure 
that a robust system could be less brittle to less common 
gestures occurring during surgery deviations, errors, and 
error recovery. 

In one hand, the feasibility of current robotic surgery 
systems to record quantitative motion and video data 
motivates the development of descriptive mathematical 
models to recognize and analyze surgical tasks. On the 
other hand, recent advances in machine learning research 
for uncovering concealed patterns in huge data sets, like 
kinematic and video data, offer a possibility to better 
understand surgical procedures from a system point of 
view. Therefore, distance-based time series classification 
framework for task recognition has been developed [18], 
2.2 Surgical gesture recognition and segmentation 
Gesture recognition is a topic in computer science and 
language technology with the goal of interpreting human 
gestures using mathematical algorithms. Human gesture 
recognition is a large research domain that has been 
studied widely in the last decades. The trend is highly 
motivated by the wide variety of applications concerned 
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with understanding human gesture such as human- 
machine interaction and medical monitoring. Gesture 
recognition enables humans to communicate with the 
machine and interact naturally without any mechanical 
devices. Several methods have been used for gesture 
recognition such as template -matching [19], dictionary 
lookup [20], statistical matching [21], [22], linguistic 
matching [23], neural network [24], and ad hoc methods. 
The key problem in gesture recognition is how to make 
gestures understood by computers for example how we 
can make computer understand hand or head gesture. For 
the hand gesture recognition, the approaches present can 
be mainly divided into “Glove-Based” and “Vision- 
Based” approaches. The gloved based methods use sensor 
devices to capture hand and finger motions into multi- 
parametric data. However, the devices are quite expensive 
and cumbersome to the users [25]. In contrast, the vision- 
based methods require only a camera [26] in order to 
realize natural interaction between humans and computers 
and there is no need for any extra devices. Many studies 
have be done on the area of vision-based hand gesture 
recognition for human computer interaction, 
consolidating the various available approaches, pointing 
out their general advantages and disadvantages [27], [28]. 
In the area of surgery, significant research has been 
conducted over the past ten years for gesture recognition 
of surgeon. They have been assessed in many studies by 
either tracking the surgeon’s body motion in the operation 
room [29] or hand motion while performing a specific 
surgical task [30], [31] and [11]. The Imperial College 
Surgical Assessment Device (ICSAD) system tracks the 
surgeon’s hand motions during surgery using 
electromagnetic markers [31]. In related work, [32] 
focuses on the analysis of kinematic parameters of motion 
including translation and rotation of both the tool and 
camera. 

Several research groups have examined movement 
characteristics directly, seeking low-level signal 
processing features that can be used to automatically 
differentiate surgeons into different skill levels [33]. Lin 
et al. [30] used a neural network modeling approach to 
classify signals recorded on the da Vinci surgical robot 
into eight surgeon gestures which shows below: 

1) Reach for needle 

2) Position needle 

3) Insert and push needle through tissue 

4) Move to middle with needle (left hand) 

5) Move to middle with needle (right hand) 

6) Pull suture with left hand 

7) Pull suture with right hand 

8) Orient needle with both hands 

The extension of [30] is Reiley et al.’s [34] work that also 
used the da Vinci, but with a larger participant pool. They 
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used eleven surgical gestures, adding three gestures to 
Lin’s vocabulary; right hand assisting left while pulling 
suture, loosen up more suture, and end trial. These 
additional gestures were added by necessity from their 
surgery observations. 

In manual minimally invasive surgery, the signals are 
often recorded through magnetic trackers or color 
markers. Cristancho [35] used a Polhemus 3SPACE 
Fastrak 6-dof electromagnetic system to track 
conventional manual laparoscopic tools and used 
Principal Components Analysis (PCA) to determine the 
main contributors to overall task variability. Richards et. 
al. [36] applied force and torque sensors to manual 
laparoscopic tools and found a significant difference in 
the force and torque signatures of basic movements 
between novice and expert surgeons. With the advent of 
new technology for capturing data, more sophisticated 
machine learning method has been developed [37], [38], 
2.3 Camera movement and positioning 
Visualization of the surgical field is vital to have a 
successful operation in both open and laparoscopic 
operations. Whereas during open procedures surgeons 
control visualization directly by their own eye movements 
and tissue manipulation, visualization during laparoscopy 
relies heavily on an assistant who navigate the 
laparoscope. Among a number of differences between 
open and laparoscopic surgery, such as fulcrum effect or 
tactile feedback, there is a disturbance between surgeon’s 
hands and eyes by interposition of a camera, which moves 
independently of the surgeon. 



The camera is sometimes held by a medical student or 
junior resident, who may be unfamiliar with the surgical 
procedure, may stand in an uncomfortable position, or 
become fatigued or distracted. This results in the camera 
rotating away from the horizon and/or inadvertent drifting 
away from the surgical field with increased rates of 
surgical errors [39]. In fact, surgical errors leading to 
injuries mostly because of misperception, rather than lack 
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of knowledge or judgment [40]. Mechanical camera 
holders, passive or robotic, may provide surgeons with a 
more stable image and enable them to control their own 
view direction [41], 

To improve the current mode of laparoscopic surgery, 
many mechanical scope positioning systems have been 
proposed [42]. The general idea is to have a robot holding 
the scope and responding to the positioning commands 
given by the surgeon through a speech interface system, a 
hand-held controller or a foot pedal, or other interface 
mechanisms. In this regard ‘choreographed’ scope 
maneuvering capability in laparoscopy was developed 
with active vision guidance [43]. 

To free the surgeon from the task of controlling the view 
and to automatically offer an optimal and stable view 
during laparoscopic surgery, several automatic camera 
positioning systems have been devised. These systems 
visually extract the shape and/or position of the surgical 
instrument from the laparoscopic images in real time, and 
automatically manipulate the laparoscope to center the tip 
of the instrument in the displayed image. In a 
laparoscopic image, these systems are based on the simple 
idea that the surgeon’s region of interest is corresponding 
to the projected position of the surgical tool end part. 
Besides centering on the most interesting area, there is an 
additional and important factor that defines a good image 
of the surgical scene that corresponds to the depth of 
insertion of the laparoscope along its longitudinal axis. 

The pioneering studies of fully automatic camera 
positioning systems defined the zooming ratio as a 
“uniform” function of the estimated distance between the 
tip of the tool and the laparoscope [44] or the area ratio 
between the visible tool and the whole image [45]. 
Although this method is entirely possible to remove 
surgeon task controlling the camera but it does not 
provide specific view that the surgeon is considering, due 
to the fact that the ratio of camera zooming is widely 
different during operative. The best zooming ratio 
depends on both the surgical procedure/phase and the 
habits/preferences of the operating surgeon. For this 
reason, most of the instrument tracking systems recently 
developed [46] and [47] have abandoned the idea of 
systematic control of zooming parameters; instead, the 
surgeon is required to define the parameters 
preoperatively or adjust them intra-operatively through 
conventional human-machine interfaces, which again 
means an extra control burden for the surgeon. 

To overcome this problem, [48] first investigated how the 
camera assistant decides the zooming ratio of 
laparoscopic images by fully analyzing the positional 
relationship between the laparoscope and the surgical 
instrument during laparoscopic surgery. They extracted 
the zooming behavior and implemented it in a robotic 
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laparoscope positioner that has been developed. As a 
result, the zooming behavior of their robotic system 
became very similar to that of the human camera 
assistant. It was found that the proposed zooming motion 
in the robotic system can be suitable for fast and compact 
operations during laparoscopic surgery. 

As previous researches show, having an accurate view for 
surgeon during laparoscopic surgery is very important but 
unfortunately there is less attention to this fact. Although 
compare to manually control of camera, robotic system 
improve the quality of picture during surgery in terms of 
positioning and zooming but deep study should have be 
done in this area. 

m. FUTURE DIRECTIONS 

As described in pervious section, the use of laparoscopic 
surgery has increased rapidly during the past two decades 
due to the fact that it is much less traumatic than regular 
surgery, which result in less postoperative pain and 
shorter recovery time after surgery. During laparoscopic 
surgery, endoscopic instruments are passed through small 
incisions on the abdominal wall, to reach the surgical site 
within the patient’s abdomen. Special camera is attached 
to a long stem laparoscopic lens to provide an inside view 
of the surgical site and allows the surgeon to explore the 
intra-abdominal organs and structures. 

In conventional laparoscopic surgery, both hands of the 
surgeon are engaged with surgical instruments, so the 
laparoscopic camera is handled by an assistant who is 
responsible for all camera controlling such as holding and 
maneuvering the laparoscope following surgeon needs. It 
is obvious that this cooperation between camera controller 
and surgeon requires a high degree of coordination, which 
is not as simple we might think to achieve and maintain 
during the entire procedure due to the long duration of 
surgery. There have been efforts to facilitate camera 
manipulation tasks during laparoscopic surgery 
procedures by employing robotic systems. The major 
impact of these robots in laparoscopic surgery is to reduce 
the need for assistive staff, to provide a larger space for 
surgeon maneuvers and also to provide direct control over 
the laparoscopic camera with high stability and 
geometrical accuracy and no fatigue and inattention. The 
surgeon controls the motion of the endoscope using a 
human-machine interface, e.g. a joystick, foot pedal, 
voice or tracking surgeon head movements. 

With all these development in the laparoscopic surgery, it 
is still an open area for research to find a way to predict 
surgeon view and camera positioning and zooming ratio 
during surgery and in order to do so we should have 
generalized identification of surgeon gesture using task 
analysis. The gesture classes we are focusing on are reach 
for needle, position needle, insert and push needle 
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through tissue, move to middle with needle both hands, 
pull suture with right or left hand and orient needle with 
both hands. One important direction for this research are 
is developing a quantitative model can predict camera 
positioning and zooming ratio based on surgeon gestures 
validated through task analysis methods. Answering this 
question requires an exhaustive knowledge within 
multidisciplinary fields including knowledge about the 
surgery tasks, gesture recognition and camera positioning 
and zooming ratio. So in order to find an optimal camera 
mode for fundamentals of laparoscopic surgery (FLS) 
different methods are going to use that discuss in next 
chapter. 

As described before, in conventional laparoscopic 
surgery, a human assistant controls the laparoscopic 
image by directing the laparoscope on the operative field, 
following the instructions of the surgeon. This task 
requires active communication between the surgeon and 
the assistant, which result in arising confusion or physical 
space conflicts. Because the surgeon must focus on 
directing the assistant, he or she is distracted from actual 
operation. Furthermore, human camera control may result 
in not having optimal image due to tremor, off-center drift 
or the loss of horizontal orientation and therefore frequent 
correction is required. Moreover, in almost all 
laparoscopic surgery, images are highly magnified so 
slight hand trembling induces annoying jitter in the video 
display. Consequently, a waste of operator effort and a 
risk to the patient both result. 

On the other hand, it is possible to give the surgeon direct 
control of his/her visual feedback, eliminating the 
assistant control. The procedure can thus be performed 
faster and with greater ease. However, giving the surgeon 
direct control has the undesired side effect that the 
surgeon is completely being distracted to maneuver the 
scope. Using robotic camera assistant in laparoscopic 
surgery has proven to be beneficial in this case. This 
mode of operation improves the visual feedback and 
camera control to the surgeon. 

Altogether, current positioners rely completely on the 
surgeon’s interactive commands, even within robotic 
assistants, and lack the intelligence to automate the 
camera control. The question arise here is “Can we 
anticipate the surgeon’s viewing need to position the 
scope without the surgeon’s intervention using task 
analysis method?” To address this question, we should 
explore two different questions: 

1. How can we predict next surgeon gesture having 
previous gesture using dynamic real time data 
and task analysis method? 

2. How can camera position and zooming level of a 
surgery be recognized using task analysis 
method? 
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To answer both questions, first we should have a deep 
knowledge about tasks and sub-tasks during surgery. For 
this purpose first we limit ourselves to suturing task, one 
of the important and complex surgical tasks, and then try 
to generalize our model to other tasks such as cutting or 
placement and securing of ligating loop. Also another 
advantage of choosing suturing procedure is that we are 
able to anticipate camera exact position and zooming 
level because the scope aiming and movements are 
repetitive and follow a fixed pattern and it is zooming in 
when the surgeon is tying a knot and zooming out when 
the surgeon is pulling on the suture. Though, the general 
question above will change to this specific question: 
“How can we find the exact time when the surgeon is 
tying a knot to zooming in or pulling on the suture to 
zoom out?” Although we could have an overall 
anticipation based on the suturing structural procedure but 
a precise prediction of next step camera positioning and 
zooming is desired during the dynamic atmosphere of 
surgery procedure that may vary from surgeon to surgeon. 
In current laparoscopic surgery, the vision of the 
operating surgeon usually depends on the camera assistant 
responsible for guiding the laparoscope. The assistant 
holds the laparoscope for the surgeon and positions the 
scope according to the surgeon’s instructions. Commands 
are often interpreted and it causes this method become 
frustrating and inefficient for the surgeon. Also, the scope 
is sometimes aimed incorrectly and vibrates or drifts 
because of the assistant, resulting in suboptimal and 
unstable view. The robotic technologies, specifically, the 
development of robotic laparoscope positioning systems 
is a major step toward solving this problem. 

One important difference between robotic -assisted and 
manual laparoscopic surgery is that the control of the 
endoscope transfers to the surgeon. In manual 
laparoscopic surgery, another surgeon, resident or staff 
person is responsible for this role. Although the control 
of the camera eliminates the need for other assistant 
during procedure but giving the control to the surgeon is 
adding an additional task to an already overloaded 
surgeon. For this reason, allowing a robot to 
automatically control the zoom based on the surgeon’s 
task has a great opportunity to contribute to a surgeon’s 
performance. Ellis et al. [49], [50] demonstrated that the 
zoom level had a significant effect on surgeon 
performance. Removing the task of camera control from 
the surgeon would relieve the surgeon of a task and 
ensure quick and responsive camera control. 

IY. CONCLUSION 

In this paper, we report the recent development on the 
research of camera positioning in robotic assisted surgery 
with focus on various computational analytic techniques. 
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As the camera positioning problem is highly multi- 
disciplinary, we presented the different related areas such 
as addressing gesture recognition and segmentation, with 
a focus on robotic assisted surgery and surgeon gesture 
classification based on task analysis. We then focus on 
camera positioning and zooming level during 
laparoscopic surgery. Various method on algorithms on 
this are surveyed in this paper. Overall, autonomous 
camera movement and positioning for robotic assisted 
surgery is still in its infancy. It involves the cooperation 
of many disciplines. In order to understand this better, not 
only for machines, but also for humans, substantial 
research efforts in computer vision, machine learning and 
psycholinguistics will be needed. 
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